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CANONICAL CORRELATION ANALYSIS BETWEEN 

TIME SERIES AND STATIC OUTCOMES, WITH 
APPLICATION TO THE SPECTRAL ANALYSIS OF 
HEART RATE VARIABILITY 

By Robert T. Krafty*'"!" and Martica Hall"!" 

University of Pittsburgh 

Although many studies coUect biomedical time series signals from 
multiple subjects, there is a dearth of models and methods for as- 
sessing the association between frequency domain properties of time 
series and other study outcomes. This article introduces the random 
Cramer representation as a joint model for collections of time se- 
ries and static outcomes where power spectra are random functions 
that are correlated with the outcomes. A canonical correlation anal- 
ysis between cepstral coefficients and static outcomes is developed 
to provide a flexible yet interpretable measure of association. Esti- 
mates of the canonical correlations and weight functions are obtained 
from a canonical correlation analysis between the static outcomes and 
maximum Whittle likelihood estimates of truncated cepstral coeffi- 
cients. The proposed methodology is used to analyze the association 
between the spectrum of heart rate variability and measures of sleep 
duration and fragmentation in a study of older adults who serve as 
the primary caregiver for their ill spouse. 

1. Introduction. Scientific and technological advances have led to an 
increase in the number of studies that collect and analyze biological time se- 
ries signals from multiple subjects. In many instances, the frequency domain 
properties of the time series contain interpretable physiological information. 
Examples of such time series include electroencephalographic signals (Buysse 
et al., 2008; Qin and Wang, 2009), hormone concentration levels (Diggle and 
Al Wasel, 1997; Gronfier and Brandenberger, 1998), and heart rate variabil- 
ity (Hall et al., 2007; Krafty, Hall and Guo, 2011). The goal of many such 
studies is to quantify the association between power spectra and collections 
of correlated outcomes. 

This article is motivated by a study whose objective is to better under- 
stand the association between stress and sleep in older adults who are the 
primary caregiver for their spouse. In this study, heart rate variability and 
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multiple measures of sleep duration and fragmentation are collected from 
participants during a night of sleep. Heart rate variability is the measure of 
variability in the elapsed time between consecutive heart beats. Its power 
spectrum has been shown to be an indirect measure of autonomic nervous 
system activity and is used by researchers as a measure of stress (Task Force 
of the ESC/ASPE, 1996; Hall et al., 2007). Measures of sleep duration and 
fragmentation have been shown to be associated with health and function- 
ing when measured either subjectively through self-report sleep diaries or 
objectively through the collection of electrophysiological signals known as 
polysomnography (PSG) (McCall et al., 1995; Hall et al., 2008; Nock et al., 
2009; Silva et al., 2007; Vgontzas et al., 2010). We desire an analysis of these 
data that can illuminate the relationship between stress and sleep by quan- 
tifying the association between the spectrum of heart rate variability and 
both self-reported and PSG derived measures of sleep. 

The majority of methods for the spectral analysis of time series from mul- 
tiple subjects where spectra depend on static covariates deal exclusively with 
covariates that take the form of qualitative grouping variables (Shumway, 
1971; Diggle and Al Wasel, 1997; Brihinger, 2002; Fokianos and Savvides, 
2008; Stoffer et al., 2010). These methods are not applicable when the co- 
variates are quantitative variables such as measures of sleep duration and 
fragmentation. Krafty, Hall and Quo (2011) introduced the mixed effects 
Cramer representation as a model for time series data where subject-specific 
power spectra depend on covariates and can account for quantitative vari- 
ables. The mixed effects Cramer representation has two characteristics which 
limit its effectiveness for modeling and analyzing time series and correlated 
static outcomes. First, it assumes a semiparametric model for log-spectra 
conditional on static outcomes. As is the case in our motivating study, a 
semiparametric form is often unknown and a nonparametric model is re- 
quired. Second, it provides a measure of association between time series 
and a static outcome conditional on the other outcomes through a regres- 
sion coefficient. When the outcomes are correlated, extracting scientifically 
meaningful information from the conditional associations provided by the 
multiple regression coefficients can be challenging. In our motivating study, 
clinically useful information concerning the relationship between the spec- 
trum of heart rate variability and the multiple correlated measures of sleep 
duration and fragmentation requires parsimonious measures of association. 

To offer a nonparametric model and interpretable measures of association 
between time series and sets of correlated static outcomes, we introduce the 
random Cramer representation and ensuing canonical correlation analysis 
(CCA). The random Cramer representation considered in this article is a 
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nonparametric joint model for time series and sets of static outcomes where 
the transfer function of the time series is random and the subject-specific 
log-spectra are correlated with the static outcomes. Unlike the mixed effects 
Cramer representation of Krafty, Hall and Guo (2011), no conditional semi- 
parametric form for the log-spectrum is assumed. The theoretical framework 
introduced by Eubank and Hsing (2008) is used to define a CCA between the 
cepstral coefficients, or the Fourier coefficients of the log-spectrum (Bogert, 
Healy and Tukey, 1963), and the static outcomes. Estimates of canonical 
correlations and weight functions are obtained through a procedure which 
first estimates the cepstral coefficients via Whittle likelihood regression then 
performs a standard multivariate CCA between the estimated cepstral co- 
efficients and static outcomes. 

The rest of the article is organized as follows. Section 2 describes our mo- 
tivating study: the Age Wise Study. Section 3 introduces the random Cramer 
representation and CCA. The estimation procedure is developed in section 
4. Section 5 presents the results of a simulation study and the proposed 
method is applied to data from the AgeWise Study in section 6. A discus- 
sion is offered in section 7. 

2. The AgeWise Study. The mental and emotional stress faced by 
older adults who are the primary caregiver for their ill spouse places them 
at an increased risk for the development of disturbed sleep which can effect 
their health and functioning (McCurry et al., 2007). A goal of the AgeWise 
Study conducted at the University of Pittsburgh is to gain a better under- 
standing of the association between stress and sleep in older adults who are 
the primary caregiver for their ill spouse in order to inform the development 
of behavioral interventions to enhance their sleep. 

The participants in this project are = 46 men and women between 
60-89 years of age. Each participant serves as the primary caregiver for 
their spouse who is suffering from a progressive dementing illness such as 
Alzheimer's or advanced Parkinson's disease. Participants were studied dur- 
ing a night of in-home sleep through ambulatory PSG. The recorded PSG 
signals were used to compute objective measures of total sleep time (TST) 
as the number of minutes spent asleep during the night, sleep latency (SL) 
as the number of minutes elapsed between attempted sleep and sleep onset, 
and wakefulness after sleep onset (WASO) as the number of minutes spent 
awake between sleep onset and the final morning awakening. Upon awaken- 
ing, the participants completed a self-report sleep diary which was used to 
compute self-reported measures of TST, SL, and WASO. 

The ambulatory PSG included a modified 2-lead electrode placement to 
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Subj 1 
HRV 




Fig 1. Heart rate variability, bias-adjusted log-periodograms, and estimated log- spectra for 
two subjects during the first three minutes of stage 2 sleep. The sleep outcomes for these 
two subjects are displayed in Table 1. 

collect the electrocardiogram signal continuously throughout the night at a 
512 Hz sampling rate. The electrocardiogram signal was digitally stored for 
off-line processing, cleaned of artifacts, and used to identify the R-waves as 
the location of the upward deflection of the electrocardiogram signal asso- 
ciated with each heartbeat. Interbeat intervals were then computed as the 
number of milliseconds between each successive pair of R-waves to provide 
a measure of the elapsed time between consecutive heart beats. Visually 
scored sleep staging was temporally aligned with the interbeat intervals for 
each participant to allow for the isolation of the epochs of interbeat inter- 
vals during the first three minutes of uninterrupted stage 2 sleep. To ensure 
proper physiological interpretation in accordance with established guidelines 
for nonparametric spectral analysis, we analyze the heart rate variability se- 
ries generated by sampling the cubic interpolation of the interbeat intervals 
verses the R-waves at 2 Hz, resulting in time series of length T = 360 (Task 
Force of the ESC/ASPE, 1996). 

The data for two subjects in the form of heart rate variability time se- 
ries and six measures of sleep duration and fragmentation are displayed in 
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Table 1 

Time spent asleep (TST), sleep latency (SL), and wakefulness after sleep onset (WASO) 
as measured by polysomnography (PSG) and self-reported sleep diary (D) for the two 
subjects whose heart rate variability are displayed in Figure 1. All sleep outcomes are 

reported in minutes. 





Subj 1 


Subj 2 


PSG-TST 


394 


373 


PSG-SL 


11 


10 


PSG-WASO 


94.67 


78.33 


D-TST 


383 


403 


D-SL 


20 


2 


D-WASO 


45 


15 



the top panels of Figure 1 and in Table 1, respectively. The primary ob- 
jective of our analysis is to illuminate the relationship between stress and 
sleep by obtaining low dimensional and interpretable measures of the asso- 
ciation between the spectrum of heart rate variability at the start of stage 
2 sleep and self-reported and PSG derived measures of sleep duration and 
fragmentation. 

3. Measuring Association Between Log-Spectra and Static Out- 
comes. 

3.1. Random Cramer Representation. This article is concerned with quan- 
tifying the association between the spectrum of a second order stationary 
time series of length T, {Xji, . . . , Xj^}, and a P— dimensional vector of 
correlated outcomes, Zj, from j = 1,...,N independent subjects. In our 
motivating sleep study, the time series of length T = 360 are three minute 
epochs of heart rate variability from = 46 participants while Zj are P = 6 
dimensional vectors of TST, WASO, and SL as measured by self-report sleep 
diary and by PSG. 

The outcomes Zj are assumed to be independent and identically dis- 
tributed with fi^ — E(Zj) and non-singular covariance kernel 

Tz = E (Z, - fiz) {Zj - fizY ■ 

The time series are modeled through a random Cramer representation with 
a random mean uj and a random transfer function Qj that is correlated 
with Zj. The random transfer functions Qj are independent and identi- 
cally distributed complex valued random functions over M that are Hermi- 
tian, square-integrable over [0, 1], and have period 1. Formally, the random 
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Cramer representation for Xjt is 

Xjt = uj+ [ Qj{uj)e'^^''^^dKj{io) 
Jo 

where are mutually independent identically distributed mean-zero or- 
thogonal increment processes over [0,1] such that E|(iAj(a;)|^ = dio and 
Aj is independent of 0^', Zj/, and Uji for all j and j' . The time series 
{Xjt : t G Z} exists with probability one and is second order stationary. 

In many applications, such as our motivating study, scientific interest lies 
in the ratio of power at different frequencies. This is equivalent to looking at 
linear combinations of the log-spectrum. Consequently, we consider spectral 
properties on the log-scale and define the subject-specific log-spectrum for 
the jth subject as the random function 

Fjioo) = log|e,(a;)|2. 

To assure that the first two moments of Fj exist and are bounded, it is 
assumed that sup^gjj S |0j(a;)|^ < oo and inf^gK £' |0j (a;)|^ > 0. We focus 
on log-spectra which possess square integrable first derivatives and define 
F as the space of even functions with period 1 whose first derivatives are 
square integrable. 

When Fj S F with probability 1, the subject-specific log-spectra possess 
the cosine expansion 

oo 

Fj{uj) = fjo + ^ fjkV2 cos{2TTUjk) 

k=l 

fjO = / Fj{uj)du; 
Jo 

fjk = / Fj(uj)V2 cos {2Trujk)duj, k = l,2,... 
Jo 

where fj = {fjk ■ A; = 0, 1, ... ) is the subject-specific cepstrum (Bogert, 
Healy and Tukey, 1963). Our analysis will explore spectral properties of 
times series via the cepstral coefficients and will make use of the covariance 
and cross-covariance kernels 

Tf{k,k') = E[{fjk-iJ,k){fjk'-fJ'k')],k,k' = 0,l,... 
Tfz{k) = E[{f,k-l^k){Zj-tizy], A: = 0,1,... 

where Hk = E (fjk). 
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3.2. Canonical Correlation Analysis. To provide a parsimonious mea- 
sure of association between time series and static outcomes following a ran- 
dom Cramer representation, we will utilize the definition of CCA between 
two sets of second order random variables introduced by Eubank and Hsing 
(2008). We want to find successive linear combinations of cepstral coefficients 
and static outcomes that are maximally correlated. The first canonical cor- 
relation pi is defined as 



pj = supCov2^Qfc/jfc,/3'Zj 



\k=0 



over all Ofc G M, k = 0,1, . . . , and (3 G such that the random variables 
X^fcLo ^kfjk and (3'Zj have unit variance. The series ai = (aifc : A; = 0, 1, . . . ) 
and vector Bi where this maximum occurs are referred to as first canonical 
weights for the cepstrum and static outcomes while YlT=o'^^kfjk and B'^Zj 
are first canonical variables. For q = 2, . . . ,Q where Q is the minimum of P 
and the rank of Fj, the qth canonical correlation pq is defined as 

pI = supCov^ ^akfjk,P''Z>j 



\k=0 



over all Ofe G M, k = 0,1,..., and (3 G such that YlT=o^kfjk and 
/3'Zj have unit variance and are pairwise uncorrelated with 'YlV=o^q'kfjk 
and B'^,Zj for q' < q. The series = (a^^, : k = 0,1, . . .) and vector Bg 
where the maximum is achieved are referred to as qth weight functions for 
the cepstrum and static outcomes while YlT=o ^qkfjk and B^Zj are the qth. 
canonical variables. We will assume that Tfz^z^ is well defined and Hilbert- 
Schmidt as an operator from to the reproducing kernel Hilbert space with 
reproducing kernel T j. Under this regularity condition. Theorems 1 and 2 of 
Eubank and Hsing (2008) assure the existence of the canonical correlations, 
weight functions, and variables. 

When Yl'k^o ^ canonical variable can be represented as 

a linear function of the log-spectra Fj 



where 



oo „i 

y2aqkfjk= I Aq{u})Fj{uj)du} 

Jo 



oo 

Ag{uj) = flgo + agfc-v/2 cos(27rcjfc). 

k=l 
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When it exists, we will refer to Aq as the gth weight function for the log- 
spectrum. In our application, the goal is to estimate and analyze canonical 
correlations and the canonical weight functions for the static outcomes and 
log-spectra. Although the canonical variables for the cepstra do not always 
possess forms as integral function of log-spectra, in section 3.3 it is shown 
that they can always be approximated as such. 

3.3. Finite Approximation. The infinite dimensional formulation of the 
CCA given above is not conducive to real data applications. A finite di- 
mensional approximation can be obtained by noting that when Fj G F, the 
cepstral coefficients decay such that YlT=i^'^fjk ^ °° with probability 1. 
The decay of the cepstral coefficients was utilized by Bloomfield (1973) to 
offer a finite dimensional model for the log-spectrum by truncating the co- 
sine series at some K < T such that Fj{uj) ^ fjQ + J2k=i fjkV^cos{2TTLok). 
Under this approximation, the cepstrum can be represented as the X-vector 
fj = {fjo, . . . , fjK-i)' which has K x K covariance matrix Tj and K x P 
cross-covariance with the static outcomes Tjz- 

The CCA between the truncated cepstrum and static outcomes is a stan- 
dard multivariate CCA problem (Johnson and Wichern, 2002, Chapter 10.2). 
To find the canonical correlations pq and variables a'gfj, B'^^Zj between fj and 

Zj, define T]q to be the gth largest eigenvalue of V ^ T'jr^TjT z with 

associated eigenvector Vq where TJ is the Moore-Penrose generalized inverse 

of Tf . The canonical correlations pq and weight functions Sg and Bg can 
be computed as 



These canonical correlations and weight functions are approximations of the 
canonical correlations and weight functions defined in section 3.2. A direct 
consequence of Lemma 5 of Eubank and Hsing (2008) is that, as K ^ oo. 



Although the log-spectral weight function Aq does not necessarily exist, 
the log-spectral weight function from the finite approximation 

K-l 







k=l 



is always well defined. 



TIME SERIES CCA 



9 



4. Estimation. Estimates of Tf, Tjz and Tz will be plugged into 
(3.1) to obtain estimates of the canonical correlations and weight func- 
tions. The covariance of Zj is estimated with the standard estimator Tz = 

{N-i)-^Eti (z,-z) (Z, 



Z)' where Z = iV-^ J^f^^ Zj. 



To estimate Tf and Tfz, we consider the periodograms 



1-1 



t=i 



, J 



.N,i=l, 



which are approximately independent and distributed as when 
T is large (Krafty, Hall and Guo, 2011, Theorem 1). This large sample 
distribution of the periodogram leads to a Whittle likelihood (Whittle, 1953, 
1954) for truncated cepstral coefficients. The negative log- Whittle likelihood 
for the truncated cepstral coefficients of the jth subject is 

L(T-1)/2J 

li,K ifo, /X-I) = E A.^cos(2.../T)] 

i=l 



+fo+Y.fkV2cos{2Trke/T)\. 

k=l J 



We propose using Whittle likelihood regression to estimate T f and T fz with 

N 

f^ = (iv-i)-ij;(f,-f 

N 



fz 



zV 



where f, 



fjo, fjK^i ) minimizes Cjk and f = iV i f,-. A 



j=i '■J ■ 



Fisher's scoring algorithm for computing ij is given in the appendix. 

Estimates pq, Sg, Bg and Aq of the qth. canonical correlation and weight 
functions for cepstra, static outcomes, and log-spectra are then defined for 
q = I, . . . ,Q as 



Pq 


= VVq 


ag 






— 1- Z ^1 




K-1 


Aq{uj) 


= aqo + agfc\/2 COS {lirujk) 



k=l 
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where rjq is the qth largest eigenvalue of the PxP matrix T'j^T j-T fzT^ ' 
with associated eigenvector v^. The estimated truncated cepstral coeffi- 
cients also provide estimates of the subject-specific log-spectra as Fjiuj) = 
fjo + Ylk=i fjkV2cos{27rujk). 

These estimates depend on the number of non-zero cepstral coefficients 
K. Simulation studies have demonstrated favorable empirical performance 
of the AIC as a data driven procedure for selecting K by minimizing 

N 

C{k) = A'fc (/io, . • . , fjk-i) + 2Nk. 
i=i 

We provide Matlab code for implementing the proposed estimation pro- 
cedure in the supplemental file Krafty and Hall (2012b). 

5. Simulation Study. 

5.1. Setting. A simulation study was conducted to explore the empiri- 
cal properties of the proposed estimation procedure and compare it to two 
alternatives. For each simulated data set, log-spectra 

3 

Fj{u:) = 5 + \/2 cos {2ttuj) + + ^ CjfcV2 cos {2'Kkoj) 

k=l 

were simulated where ^jk are independent mean zero normal random vari- 
ables with Var (Cjk) = 4. Static outcomes Zj of dimension P = 3 were drawn 
as mean zero normal random vectors with covariance matrices diag(4, 4, 4) 
such that the elements of Zj are uncorrelated with ^j^, k = 0, ... ,3, ex- 
cept Corr {^j2, Zji) = 0.5 and Corr (^js, Zj2) = 0.25. Under this setting, the 
canonical correlations are pi = 0.5, p2 = 0.25, ps = 0, the weight functions 
for the log-spectra are Ai{ui) = cos{4ttu)/\/^, ^2(0;) = cos(67rcL!)/\/2, and 
the weight functions for the static outcomes are Bi = (0.5,0,0)', B2 = 
(0,0.5,0)'. After a replicate-specific log-spectrum was simulated, its square- 
root was calculated and used as the replicate-specific transfer function to 
simulate the conditionally Gaussian time series Xjt in accordance with The- 
orem 2 of Dai and Guo (2004). Five- hundred random samples were drawn 
for each of the six combinations of = 50, 100 and T = 30, 50, 100. Results 
from additional settings under varying levels of signal strength and smooth- 
ness are presented in the supplemental article Krafty and Hall (2012a). 

5.2. Estimation Procedures. In addition to the proposed cepstral based 
procedure, we also investigated two alternative estimation procedures adapted 
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from the functional CCA literature. The CCA between two functional valued 
variables has been explored by many researches including Leurgans, Moyeed 
and Silverman (1993), He, Miiller and Wang (2003, 2004), and Eubank and 
Hsing (2008). These methods can be adapted to our setting where one set 
of variables, Fj, is functional and observed with noise over a discrete grid 
through the periodograms, and the other, Zj, is multivariate. 

The first alternative estimation procedure considered is an adaptation of 
the algorithm for functional CCA presented in section 6 of Eubank and Hs- 
ing (2008). In this procedure, we used the penalized Whittle likelihood of 
Qin and Wang (2009) to obtain smoothing spline estimators of the subject- 
specific log-spectra Fj with smoothing parameters selected through direct 
generalized maximum likelihood. These estimated log-spectra were discretized 
to form vectors of estimated log-spectra at Fourier frequencies between 1 
and [(T — l)/2j; a canonical correlation analysis was performed between 
these vectors and Zj. The rank of the covariance kernel of the discretized 
log-spectra was selected through a cross validation procedure which seeks 
to optimize the first canonical correlation by maximizing the function CVi 
discussed in Section 2.5 of He, Miiller and Wang (2004). 

The second alternative estimation procedure is an adaptation of the em- 
pirical basis approach for functional CCA advocated by He, Miiller and 
Wang (2004). This procedure began by computing the singular value de- 
composition of the sample covariance of the bias-adjusted log-periodograms, 
log (Yji) + 7 where 7 ~ 0.577 is the Euler-Mascheroni constant. The eigen- 
vectors were smoothed to obtain a functional basis. The bias-adjusted log- 
periodograms were then projected onto a finite number of these basis func- 
tions and a multivariate CCA was performed between the projections and 
the static outcomes. The number of basis functions was selected through 
cross-validation by maximizing CVi (He, Miiller and Wang, 2004, Section 
2.5). 

5.3. Results. We assessed performance through the square error of es- 
timates of the canonical correlations, the square error of estimated weight 
functions for the static outcomes in the standard Euclidian norm, and the 
square error of the vector of estimated weight functions for the log-spectra 
evaluated at the Fourier frequencies in the standard Euclidian norm. The 
mean and standard deviation of the square errors are displayed in Table 2. 
The mean and variance of the errors of the proposed cepstrum based esti- 
mator were smaller than those of the two alternatives for each parameter 
under every setting. The most drastic benefit in the cepstral based proce- 
dure was found in the estimation of the canonical weight functions for the 
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Table 2 

Simulation Results: The mean (standard deviation) of the square error xlO^ of the 
estimators of the first two sets of weight functions and of the three canonical 
correlations. Three estimation procedures are implemented: Cep, the proposed cepstral 
based procedure; FDA, adaptation of the functional CCA algorithm presented by Eubank 
and Using (2008); EB, adaptation of the empirical basts approach to functional CCA 
presented by He, Miiller and Wang (2004)- 





Ai 


A2 


Bi 


Ba 


Pi 


P2 


P3 








N=100, T=100: 






Cep 

FDA 

EB 


0.27(0.75) 
1.87(3.48) 
1.17(1.65) 


0.79(2.47) 
2.26(4.37) 
2.17(2.91) 


1.28(2.19) 
1.60(2.37) 
1.73(2.87) 


3.32(3.67) 
4.66(4.18) 
4.64(4.15) 
N=100, T=50: 


0.57(0.66) 
0.80(0.94) 
1.09(1.62) 


0.85(1.07) 
1.24(1.58) 
1.79(1.94) 


1.81(1.62) 
1.98(2.57) 
2.87(3.37) 


Cep 

FDA 

EB 


0.57(1.11) 
2.03(3.50) 
1.74(2.43) 


1.67(3.50) 
3.03(4.86) 
3.34(4.24) 


1.30(2.08) 
1.66(2.46) 
1.61(2.56) 


3.45(3.67) 
4.63(4.49) 
4.48(4.24) 
N=100, T=30: 


0.58(0.68) 
0.80(0.91) 
1.15(2.03) 


0.84(1.02) 
1.16(1.44) 
1.72(1.94) 


1.93(1.68) 
1.97(2.41) 
2.79(2.96) 


Cep 

FDA 

EB 


0.87(0.83) 
1.99(3.18) 
1.65(2.23) 


1.61(2.38) 
3.20(4.52) 
3.27(4.27) 


1.42(2.21) 
1.51(2.06) 
1.62(2.57) 


3.75(4.03) 
4.77(4.65) 
4.44(4.34) 
N=50, T=100: 


0.54(0.64) 
0.80(1.00) 
1.19(2.29) 


0.84(1.09) 
1.04(1.19) 
1.44(1.73) 


1.97(1.67) 
2.03(2.75) 
2.54(2.89) 


Cep 

FDA 

EB 


0.65(2.47) 
4.26(7.16) 
1.53(2.22) 


0.98(2.89) 
4.33(7.41) 
2.07(2.83) 


2.54(3.32) 
3.77(3.93) 
3.44(4.31) 


5.64(4.94) 
7.20(5.14) 
7.24(4.88) 
N=50, T=50: 


1.46(1.68) 
2.36(2.47) 
2.62(3.12) 


1.85(2.19) 
2.77(3.07) 
3.94(3.52) 


3.02(2.74) 
3.94(5.69) 
4.82(6.52) 


Cep 

FDA 

EB 


1.13(2.48) 
4.63(7.05) 
2.64(3.61) 


1.93(3.71) 
4.83(6.75) 
3.32(4.00) 


2.56(3.38) 
3.70(4.14) 
3.60(4.29) 


5.79(4.92) 
7.30(5.17) 
6.86(4.81) 
N=50, T=30: 


1.48(1.70) 
2.55(2.57) 
2.63(3.00) 


1.93(2.18) 
2.81(3.20) 
3.79(3.42) 


3.25(2.89) 
4.52(5.63) 
4.95(6.11) 


Cep 

FDA 

EB 


1.39(1.71) 
4.52(6.39) 
3.12(4.31) 


1.84(2.38) 
5.10(6.65) 
4.04(5.05) 


2.67(3.33) 
3.77(4.05) 
3.50(4.21) 


5.85(4.95) 
6.88(5.05) 
6.95(4.92) 


1.40(1.73) 
2.55(2.63) 
2.54(2.92) 


1.99(2.20) 
2.83(3.41) 
3.50(3.30) 


3.26(2.94) 
4.52(5.90) 
4.55(5.80) 



log-spectra. The modification of the function CCA algorithm of Eubank and 
Hsing (2008) had smaller error in estimating the canonical correlations as 
compared to the empirical basis approach while the empirical basis approach 
demonstrated better performance in estimating the weight functions of the 
log-spectra. 

6. Analysis of Data from the Age Wise Study. 

6.1. Data Analysis. We analyzed the data from the project described in 
section 2 that consist of time series of heart rate variability during the first 
three minutes of stage 2 sleep and P = 6 measures of sleep duration and 
fragmentation from = 46 participants. The mean, standard deviation, 
and correlation matrix of the sleep variables are displayed in Table 3. To 
aid in the interpretation of the weight functions, we standardized the six- 
dimensional vector of sleep outcomes. The proposed procedure estimated the 
first two canonical correlations as pi = 0.52 and p2 = 0.19; the remaining 
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Table 3 

Correlation matrix, means, and standard deviations of the six sleep variables: time spent 
asleep (TST), sleep latency (SL), and wakefulness after sleep onset (WASO) as measured 
by polysomnography (PSG) and self-reported sleep diary (D). 



Correlation 


PSG- 


PSG- 


PSG- 


D- 


D- 


D- 






TST 


SL 


WASO 


TST 


SL 


WASO 




PSG-TST 


1.00 


-0.12 


-0.11 


0.45 


-0.08 


0.17 




PSG-SL 


-0.12 


1.00 


-0.31 


0.06 


-0.04 


-0.22 




PSG-WASO 


-0.11 


-0.31 


1.00 


-0.19 


0.13 


0.38 




D-TST 


0.45 


0.06 


-0.19 


1.00 


-0.51 


-0.51 




D-SL 


-0.08 


-0.04 


0.13 


-0.51 


1.00 


0.40 




D-WASO 


0.17 


-0.22 


0.38 


-0.51 


0.40 


1.00 








PSG- 


PSG- 


PSG- 


D- 


D- 


D- 






TST 


SL 


WASO 


TST 


SL 


WASO 


Mean (minutes) 




377.4 


18.3 


81.8 


383.2 


21.1 


36.9 


Standard Deviation (minutes) 


66.4 


19.9 


39.3 


89.9 


31.9 


52.8 



higher order correlations were estimated to be less than 3%. Figure 2 dis- 
plays the estimated weight functions Ai,A2 of the log-spectra while Table 4 
displays the weight functions Bi,B2 of the standardized sleep variables. 

The estimated first canonical weight function for the standardized sleep 
outcomes is negative for all sleep measures expect for PSG derived WASO, 
which is close to zero. Note that the sum of SL, TST, and WASO measures 
the total time in bed so that large values for the first canonical variable for 
the standardized sleep outcomes are associated with less time spent in bed. 
The weight function for the first canonical variable for the log-spectrum of 
heart rate variability is positive at all values. Consequently, the first canoni- 
cal variable for the log-spectrum is a measure of total power or total variance. 

The estimated second canonical weight function for the standardized sleep 
variables is a contrast between the amount of time spent awake during the 
night, as measured by both diary and PSG, and the amount of time asleep 
during the night, as measured primarily by self-report diary. The estimated 
second canonical weight function for the log-spectrum is positive for fre- 
quencies less than 0.17 Hz and negative for frequencies greater than 0.17 
Hz. Recall that our analysis is on the log-scale so that the estimated sec- 
ond canonical variable is a ratio on the natural scale of power from low 
frequencies to power from high frequencies. The two subjects whose data 
are displayed in Figure 1 and Table 1 exemplify this association. Subject 1 
displays a larger ratio of power between low and high frequencies as com- 
pared to subject 2 with an estimated second canonical variable of 4.13 as 
compared to -0.11. All sleep variables for subject 1 are larger than those 
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Hz Hz 



Fig 2. Estimated canonical weight functions for the log-spectrum of heart rate variability. 

Table 4 

Estimated canonical weight functions of the standardized sleep variables: time spent 
asleep (TST), sleep latency (SL), and wakefulness after sleep onset (WASO) as measured 
by polysomnography (PSG) and self-reported sleep diary (D). 





Bi 


B2 


PSG-TST 


-0.42 


-0.01 


PSG-SL 


-0.52 


0.17 


PSG-WASO 


0.09 


0.33 


D-TST 


-0.41 


-0.75 


D-SL 


-0.55 


0.42 


D-WASO 


-0.51 


0.30 



for subject 2 aside from diary assessed TST. Consequently, the estimated 
second canonical variable for subject 1, 0.38, is larger than that for subject 
2, -0.14. 

The adaptation of the functional CCA method of Eubank and Hsing 
(2008) that was explored in the simulation study was also implemented. 
The subject-specific log-spectra were estimated using the smoothing spline 
of Qin and Wang (2009) while the rank of the log-spectral covariance matrix 
was selected through CVi (He, Miiller and Wang, 2004, section 2.5.1). This 
procedure estimated the first canonical correlation as 46% and all higher 
order canonical correlations as zero. The estimated first weight functions 
for both the log-spectra and sleep outcomes were similar to the estimates 
obtained through the proposed cepstral based procedure. However, this pro- 
cedure estimated the second canonical correlation as zero and consequently 
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did not produce estimates of the second canonical weight functions. 

6.2. Results. The autonomic nervous system is classically divided into 
two dynamically balanced branches: the parasympathetic branch and the 
sympathetic branch. The parasympathetic branch is responsible for the 
maintenance of the body at rest while the sympathetic branch is associ- 
ated with the fight-or-flight response. Increased modulation of the sympa- 
thetic nervous system is associated with increased power in the spectrum 
of heart rate variability at low frequencies while increased modulation of 
the parasympathetic nervous system is associated with increases in power 
at both low and high frequencies (Task Force of the ESC/ASPE, 1996). 

The estimated first canonical variables suggest that less time in bed is 
associated with increased modulation of the parasympathetic nervous sys- 
tem. Excessive time spent in bed has been shown to be associated with 
increased mortality and has led to the advocacy of sleep restriction in older 
adults (Youngstedt and Kripke, 2004). The causal pathway through which 
excessive time in bed is associated with mortality is unknown and identi- 
fying possible confounders and causal intermediates of this relationship to 
inform future studies is a topic of interest (Patel et al., 2006). Diminished 
parasympathetic nervous system activity while at rest has also been linked 
to mortality (Ponikowski et al., 1997; Lanza et al., 1998). The estimated 
first canonical correlation suggests that future studies might be able to illu- 
minate the pathway through which time in bed is connected to mortality by 
exploring the role played by the modulation of the parasympathetic nervous 
system. 

The estimated second canonical variable for sleep is a contrast between 
the time spent initiating and maintaining sleep relative to the amount of 
perceived sleep and may be viewed as a measure of nocturnal wake-sleep 
balance. Negative values represent less wakefulness relative to perceived 
sleep; this profile is observed in healthy individuals without clinical sleep 
disturbances (Walsleben et al., 2004). In contrast, positive wake-sleep bal- 
ance values represent more wakefulness relative to perceived sleep as often 
observed in individuals with sleep disturbances such as insomnia (Carskadon 
et al., 1976). The estimated second canonical variable for the log-spectrum is 
a measure of the sympathovagal balance. Increased sympathovagal balance 
during sleep has been shown to be associated with symptoms of depres- 
sion and perceived stress (Hall et al., 2004, 2007). Consequently, the second 
canonical correlation suggests this simple one-dimensional measure of the 
wake-sleep cycle might be useful in informing studies to develop and evalu- 
ate behavioral therapies for improving the sleep of older adults. 
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7. Discussion. This article considered an approach to analyzing the 
association between the second-order spectrum of a time series and a set 
of static outcomes. The random Cramer representation provided a formal 
model for these data while the cepstrum based CCA provided an inter- 
pretable means of quantifying the association. This approach was motivated 
by and used to analyze the association between heart rate variability during 
sleep and measures of sleep duration and fragmentation in a population of 
adults who are the primary caregiver for their ill spouse. The analysis sug- 
gested a connection between stress and sleep which can serve as a guide for 
designing behavioral interventions to enhance the lives of caregivers. 

The work presented in this article represents one of the first approaches 
to analyzing a collection of time series whose power spectra depend on a 
set of correlated outcomes and is by no means exhaustive. This article only 
considered time series that are second order stationary. Many studies which 
collect heart rate variability are interested in the time-dependent spectral 
properties of long-term epochs which are non-stationary (Task Force of the 
ESC/ASPE, 1996). A topic of future research will be the extension of the 
random Cramer representation to the locally stationary setting through the 
use of a time- varying stochastic transfer function and the development of a 
time dependent cepstral coefficient based CCA. 

The CCA considered in this paper was used as a tool for exploratory anal- 
ysis. One might also be interested in inference on the canonical correlations 
and weight functions. Theorem 2 of Dai and Guo (2004) provides a method 
for simulating a time series with a given smooth spectral density function. 
Another topic of future research is the development of this sampling method 
to formulate a bootstrap procedure for performing inference on the canonical 
correlations and weight functions. 



The cepstral estimates ij that minimize Cjx can be computed through 
Fisher scoring. To formulate the algorithm, define the X-vectors 



so that the negative log- Whittle likelihood for the jth subject can be written 



APPENDIX: FISHER SCORING ALGORITHM 




f = 



ifo, Ik-i)' 



as 



L(T-1)/2J 
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The algorithm is defined iteratively where the estimated cepstral coefficients 
for the jth subject in the (m + l)st iteration are 

fm+l ^ |m ^ (^f™) U (f™) 

for score function 

L(T-1)/2J 

and Fisher information matrix 

L(T-1)/2J 

= - ^ QC^. 

£=1 

The algorithm continues until the change in the minimized negative log- 
Whittle likelihood is below some pre-selected threshold. We initialize the 
algorithm with the log-periodogram least squares estimators 

fO = (C'C)"'C'L,-. 

where Lj = [log(Y,-i) + 7, . . . , log(yjL(T-i)/2j ) + t]' and C is the [{T - 
l)/2j X K matrix with ^th row C^. 
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SUPPLEMENTARY MATERIAL 

Supplement A: Additional Simulation Results 

(doi: ???; .pdf). The pdf file contains the results from a more comprehensive 
simulation study. 

Supplement B: Matlab Code 

(doi: ???; .zip). The zip file contains Matlab code to run the proposed CCA 
and a file demonstrating its use. 
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